참고 이미지: https://wikidocs.net/images/page/3413/glw.png (출처: https://wikidocs.net/3413)
Greedy Layer-Wise?
Use single-layer representation learning algorithm
2 Intuitions
Expected Values
Two phase VS one phase
The popularity of unsupervised pretraining has declined
One example problelm of Transfer learning
In transfer learning, the learner must perform two or more different tasks
Sharing layers
Domain Adaptation (Sharing Higher Layer)
While the phrase "multi-task learning" typically refers to supervised learning tasks, the more general notion of transfer learning is applicable to unsupervised learning and reinforcement learning as well.
Same representation may be useful in both settings
Two examples: One-shot learning and zero-shot(zero-data) learning
Zero-shot Model
$P(y| x, T)$
If we have a training set containing unsupervised examples of objects that live in the same space as T , we may be able to infer the meaning of unseen instances of T.
Causal Factor -(Representation)-> Feature
Better Representations?
Hypothesis motivation of Semi-supervised learning
Distributed Representation?
One-hot representation VS Distribution Representation
When and Why can there be a statistical advantage from using a distributed representation as part of a learning algorithm?
Why
Functions can be represented by exponentially smaller deep networks compared to shallow networks
e.g. Generative Model
What makes one representation better than another?
Generic Regularization Stretegies